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Abstract 

Background: Risk adjusted mortality for intensive care units (ICU) is usually estimated via logistic regression. Random effects 
(RE) or hierarchical models have been advocated to estimate provider risk-adjusted mortality on the basis that standard 
estimators increase false outlier classification. The utility of fixed effects (FE) estimators (separate ICU-specific intercepts) has 
not been fully explored. 

Methods: Using a cohort from the Australian and New Zealand Intensive Care Society Adult Patient Database, 2009-2010, 
the model fit of different logistic estimators (FE, random-intercept and random-coefficient) was characterised: Bayesian 
Information Criterion (BIC; lower values better), receiver-operator characteristic curve area (AUC) and Hosmer-Lemeshow (H- 
L) statistic. ICU standardised hospital mortality ratios (SMR) and 95%CI were compared between models. ICU site 
performance (FE), relative to the grand observation-weighted mean (GO-WM) on odds ratio (OR), risk ratio (RR) and 
probability scales were assessed using model-based average marginal effects (AME). 

Results:lhe data set consisted of 145355 patients in 128 ICUs, years 2009 (47.5%) & 2010 (52.5%), with mean(SD) age 
60.9(18.8) years, 56% male and ICU and hospital mortalities of 7.0% and 10.9% respectively. The FE model had a BIC = 64058, 
AUC = 0.90 and an H-L statistic P-value = 0.22. The best-fitting random-intercept model had a BIC = 64457, AUC = 0.90 and H- 
L statistic P-value = 0.32 and random-coefficient model, BIC = 64556, AUC = 0.90 and H-L statistic P-value = 0.28. Across ICUs 
and over years no outliers (SMR 95% CI excluding null-value = 1) were identified and no model difference in SMR spread or 
95%CI span was demonstrated. Using AME (OR and RR scale), ICU site-specific estimates diverged from the GO-WM, and the 
effect spread decreased over calendar years. On the probability scale, a majority of ICUs demonstrated calendar year 
decrease, but in the for-profit sector, this trend was reversed. 

Conclusions: The FE estimator had model advantage compared with conventional RE models. Using AME, between and 
over-year ICU site-effects were easily characterised. 
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Introduction 

Risk-adjusted mortality has been used to characterise the 
performance of health care providers for a number of years [1] 
and has generated a substantial [2] if not controversial [3] 
literature. Inference regarding risk-adjusted mortality is dependent 
on both the illness severity measure [4,5] and the estimation 
method [6,7]. Mortality probability estimation usually proceeds 
via conventional logistic regression [8] but a call for "Improving 
the statistical approach to health care provider profiling", in 
particular the use of Bayesian methods, was made some 15 years 
ago [9]. Advances in standard statistical software packages have 
made such approaches feasible and a random effects or 



hierarchical approach to estimation, both Bayesian and frequen- 
tist, has recently been advocated [10,11] and implemented [12]. 

However, such recommendation must also address certain 
cautions recently advanced regarding the latter methods [13,14], 
in particular the reduction of variation of hospital performance by 
the shrinkage effect of conventional random effects models. In a 
wide-ranging discussion Ash and co-workers (The COPSS 
[Committee of Presidents of Statistical Societies] -CMS [Centers 
for Medicare and Medicaid Services] White Paper Committee, 
[15]) noted that in the presence of sufficient stand-alone hospital 
data and an appropriately specified model, a fixed effects 
approach (in this case, separate hospital-specific intercepts) would 
ensure "...successful adjustment for potential confounding [15]." 
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Figure 1. Binned residual plots. Binned residual plots [46] for FE, random intercept and random coefficient models: y-axis, average residual 

(expectation =0); x-axis, average predicted mortality probability. 

doi:10.1371/journal.pone.0102297.g001 



Such endorsement has been reiterated by the empirical demon- 
stration of the efficacy of such a fixed effects approach [6,16-20]. 
This being said, the interpretation of (3 coefficients (log-odds ratios 
or odds ratios) from such a fixed effects model as "substantive 
effects" may be problematic due to unobserved heterogeneity and 
confounding of effects [21]. Furthermore, as argued by Angrist, 
structural parameters (that is, the P coefficients) may be of 
theoretical interest, but must be "...converted into causal effects if 
they are to be of use for policy evaluation or determining whether 
a trend association is causal" [22]. 

The Australian and New Zealand Intensive Care Society 
(ANZICS) adult patient data base (APD) [23], administered by the 
Centre for Outcome and Resource Evaluation (ANZICS CORE) 
[24], is a high-quality bi-national intensive care patient data-base, 
and satisfying the above data requirements, would be entirely 
suited to such a modelling approach. Using recent data from this 
data-base (calendar years 2009-2010), the purpose of this paper 
was to (i) develop a predictive fixed effects logistic model, 
enumerate its properties and compare these with conventional 
random effects models and (ii) characterise the relative perfor- 
mance of ICUs (with respect to mortality outcomes), using the 
fixed effects model, on the probability and other scales using 
average marginal effects (AME) [21,25,26] or "marginal stan- 



dardisation" [27], adjusting for the multiple comparisons so 
undertaken [28]. 

Methods 

Ethics statement 

Access to the data was granted by the ANZICS Database 
Management Committee in accordance with standing protocols; 
local hospital (The Queen Elizabeth Hospital) Ethics of Research 
Committee approval was waived. The data set analysed is the 
property of the ANZICS Data base and contributing ICUs and is 
not in the public domain. The data are available to personnel of 
the ANZICS Data base and contributing ICUs under specific 
conditions and upon written request. 

Data management 

As previously described [29,30] the ANZICS APD was 
interrogated to define an appropriate patient set over the time 
period 2009-2010. In brief, physiological variables collected in 
accordance with the requirements of the APACHE III algorithm 
[3 1] were the worst in the first 24 hours after ICU admission, and 
all first ICU admissions to a particular hospital for the period 
2009-2010 were selected. Records were used only when all three 
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Figure 2. SMR and 95%CI by hospital level and calendar year for fixed effects model. Plots of point SMR (standardised mortality ratio) with 
95%CI versus mean (ICU) site volume, by hospital level (rural, metropolitan, tertiary and private) over calendar year (2009, 2010) for the FE model. Null 
line = 1. 

doi:1 0.1 371 /journal.pone.01 02297.g002 



components of the Glasgow Coma Score (GCS) were provided; 
records for which all physiologic variables were missing were 
excluded, and for the remaining records, missing variables were 
replaced with the normal range and weighted accordingly [32]. 
Ventilation status in the data base was recorded with respect to 
invasive mechanical ventilation on or within the first 24 hours of 
ICU-admission. Exclusions: unknown hospital outcome; patients 
with an ICU length of stay = 4 hours, and patients aged < 1 6 
years of age. Continuous variables (age, APACHE III score and 
annual-volume) were centred for model stability considerations. 
Categorical predictors were parameterized as indicator variables 
with the reference level ( = 0) indicated in parentheses in the 
following list: year (2009); gender (female); ventilation (non- 
ventilated); ICU-level, as defined in the ANZICS data dictionary 
[33], as Rural/Regional, Metropolitan, Tertiary and Private 
(Tertiary); geographical-location, that is New Zealand and the 
States of the Commonwealth of Australia (New South Wales 
(NSW), the largest contributor); ICU source, that is, patient 
transfer from another hospital (no transfer); patient surgical status 
as post-elective surgery, post-emergency surgery and non-surgical 
(non-surgical); descriptors of ICU admission primary organ system 
dysfunction, these being a consolidation of the diagnostic 



categories of the Acute Physiology and Chronic Health valuation 
(APACHE) III algorithm: cardiovascular, gastrointestinal, meta- 
bolic, neurologic, respiratory, trauma, renal/genitourinary (car- 
diovascular); ICU site (first site of the sequential numeric ordering 
of ICUs). Annual ("annualised" [34]) volume, determined for each 
ICU recorded in the database, was also considered as a (decile) 
categorical variable (first decile) [30]; see below. 

Statistical analysis 

Analyses were performed using Stata (Version 13, 2013; College 
Station, TX); continuous variables were reported as mean (SD), 
except where otherwise indicated, and statistical significance was 
ascribed at P^O.05. 

Three separate models were estimated: 

(i) logistic regression: for patient 2 in provider k the logit (log- 
odds) of hospital mortality probability ((ln[p,vt/(l —/>*)])) 
was given as: a+pX^ + ItQk, where {X} was a set of 
independent predictor variables and ).k represented the 
additional risk effect of the Ath provider (Q); that is, provider 
effects were fixed [6,35]. Appropriate accounting of patients' 
within ICUs was obtained using the robust cluster variance 



PLOS ONE | www.plosone.org 



4 



July 2014 | Volume 9 | Issue 7 | e102297 



Fixed Effects Modelling in Provider Outcome 



Random intercept model: 2009 Random intercept model: 2010 

Rural ICUs Metropolitan ICUs Rural ICUs Metropolitan ICUs 




;;;;;;; 



77JJ7777JJJJJJJJJ7 7JJ7JJJ7J7J7J7J7J7 JJ7TJ7J77JJJJ7J7JJ 




******** ******** ffjffs&fff ******** sfsssssssf * * ***** * #f4fffj>sfj>4 



Mean site volume 



Mean site volume 



Mean site volume 



Figure 3. SMR and 95%CI by hospital level and calendar year for random intercept model. Plots of point SMR (standardised mortality 
ratio) with 95%CI versus mean (ICU) site volume, by hospital level (rural, metropolitan, tertiary and private) over calendar year (2009, 2010) for the 
random intercept model. Null line =1. 
doi:1 0.1 371 /journal.pone.01 02297.g003 



option [36] of Stata; given as Var=\( £ q^q^j V, 

where V= ( — d 2 InL/d/J 2 ) is the conventional estimator 

of variance, q^ is the contribution of the kth provider to 
dlnL/dp. Assuming an additive likelihood function: 
9k = qfi qj being a row vector of observations [37,38]. 

(ii) random effects (or empirical Bayes) models, random intercept 
and random coefficient, as Pr()>ik = l|2*r) = H(j$Xfc- s rZjk'\Xk) 
for k— \,...,Q providers (provider k consisting of £=l,...,n 
observations). The I xp row vector Xik were the covariates 
for fixed effects and 1 x rvector Zjft the covariates corre- 
sponding to the random effects (u/ c ) and were used to 
represent both random intercepts and random coefficients. 
yik was the binary (0/1) outcome variable (hospital mortality) 
and H the logistic cumulative distribution function. 

a. In the random intercept model, was a scalar 1 . 

b. In the random coefficient ("slope") model, the centred 
APACHE III score (as a dominant predictor of hospital 



mortality [29]) was used; an unstructured covariance 
matrix was implemented (that is, the usual (symmetric) 
variance-covariance matrix which includes components 
of covariance between the random effects), 
c. Model estimation used (7-point) adaptive quadrature, a 
computational method used to approximate the marginal 
likelihood by numerical integration [39]; the modelling 
perspective was frequentist. 

Seasonality of mortality was addressed using trigonometric (sine 
and cosine) terms for yearly, 6 monthly and weekly effects after 
Stolwijk [40]. 

For fixed model variables, detailed above in "Methods", sets of 
parameter coefficients were tested using a global Wald test [41] 
and model development and comparison was guided by the 
Akaike Information Criterion (AIC), with the Bayesian Informa- 
tion Criterion (BIC) for non-nested models (28). In the presence of 
specific (fixed) ICU effects (parameterised as a multilevel 
(indicator) categorical variable), in the FE model only, particular 
attention was directed to the identification of variable collinearity 
with other model fixed effects variables, using the Stata module 
"_rmcoll" [42]. Model adequacy was gauged by the traditional 
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Figure 4. SMR and 95%CI by hospital level and calendar year for random coefficient model. Plots of point SMR (standardised mortality 
ratio) with 95%CI versus mean (ICU) site volume, by hospital level (rural, metropolitan, tertiary and private) over calendar year (2009, 2010) for the 
random coefficient model. Null line =1. 
doi:10.1371/journal.pone.0102297.g004 



criteria of discrimination (receiver operator characteristic curve 
area, AUC) and calibration (Hosmer-Lemeshow (H-L) statistic); 
albeit the H-L statistic will invariably be significant (P<0. 1 and H- 
L statistic > 15.99) in the presence of a large N [43] and 
increments to the grouping number (default = 10) of the H-L test 
were appropriately made [44]. Model residual analysis was 
undertaken using (i) distributional diagnostic plots, specifically 
the comparison of the empirical distribution of the residuals 
against the normal distribution; Q_-Q_and P-P plots [45]) and (ii) 
the "binned residual" approach (initially presented for small 
samples) as recommended by Gelman and Hill [46] : the data were 
divided into categories (bins) based upon the fitted values and the 
average residual (observed minus expected value) versus the 
average fitted value was plotted for each bin; the boundary lines, 
computed as 2y/p( \ —p)/n where n was the number of points per 
bin, indicated ± 2SE bounds, within which one would expect 
about 95% of the binned residuals to fall. 

Confidence intervals (CI) of the ICU standardised mortality 
ratio (SMR) were calculated by back-transformation from the 
variance of the (log) observed / predicted mortality using the 
Taylor series approximation [47]. The multivariate relationships 
(joint distribution) between various estimates were displayed using 
biplots [48]. Biplots consist of lines, reflecting the dataset variables, 



and "dots" to show the observations. The length of the lines 
approximates the variances of the variables (the longer the line, the 
higher is the variance) and the cosine of the angle between the 
lines approximates the correlation; the closer the angle is to 90, or 
270 degrees, the smaller the correlation (orthogonality or un- 
correlated); an angle of 0 or 180 degrees reflecting a correlation of 
1 or —1, respectively [49]. 

Exploration of comparative ICU site performance, by ICU level 
and calendar year, relative to the grand observation-weighted mean 
[15,19] on both the predictive probability (the default), (log) odds 
ratio (OR) and risk ratio (RR) scales was undertaken using the 
"margins" and "contrast" operators of Stata, with the FE logistic 
model. For a non-linear model the marginal effect is not the same as 
the (3 model coefficient and is dependent upon the covariate of 
interest (X) and the values of (all) other model covariates [50,51]. 
The marginal effects so calculated were understood as being (i) 
statistics calculated from predictions of a previously fit model (in this 
case, logistic) at fixed values of some covariates and averaging or 
otherwise integrating over the remaining covariates [21,28] (ii) the 
average of discrete or partial changes over all observations [52] ; that 
is, the average of predictions (AME; the default specification in 
Stata) rather than the predictions at the average of covariates [26] , 
although the latter may also be calculated (as marginal effects at the 
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Figure 5. Boxplots of SMR and 95%CI span for different models. Boxplots of: Upper panel, point estimate of SMR (standardised mortality 
ratio) by model (SMR_FE, SMR for fixed effects model; SMR_Rint, SMR for random intercept model; SMR_Rcoef, SMR for random coefficient model) 
over year (2009, 2010). Lower panel, 95%CI span by model (Clspan_FE, 95%CI span for fixed effects model; Clspan_Rint, 95%CI span for random 
intercept model; Clspan_Rcoef, 95%CI span for random coefficient model) over year (2009, 2010). 
doi:1 0.1 371 /journal.pone.01 02297.g005 



1 - 

mean, MEM). Thus, the AME is given by - P x ]f(P xi ) where 

i= l 

[i x \ is the estimated log(OR) for variable X\ , p xi is the logit for the i- 
th observation and f{ft xl ) is the probability density function (PDF) 
of the logistic distribution with regard to fi xj [21]. As noted by 
Vittinghoff et al [53], the Stata "margins" command estimates 
potential outcomes (= "causal effects") and provides valid 
confidence intervals for the parameters of a (in this case, logistic) 
marginal structural model [54] by averaging over the expected 
outcome values of the actual and potential values of, say, a binary 
treatment variable, holding all other covariates fixed at observed 
values (under the assumption of no residual confounding). We shall 
refer to these margins of responses (or predictions) as predictive 
margins after Graubard & Korn [55]. In particular, for a binary 
covariate x, coded (0/ 1), the marginal mean for x = 1 is obtained by 
considering all the observations of x wherever "x" appears in the 
model (for both direct and indirect effects; and similarly for x — 0); 
j R m 

that is [55]: - ^ exp (i r + fa) / j 1 + exp (i r + fix ik ) j for 
n k= 1 i= 1 

k=l...,R s i=\...,fli, Xjk being the covariate effect for the ith 

R 

individual in the Ath group and n= n i [56,57]. A point of note 

i=\ 



with respect to the models estimated in the current paper; predictive 
margins require that the prediction is a function only of P, the 1 x p 
model coefficient vector (matrix) and the independent (fixed) 
variables, not of stochastic functions (the random effects, Uyt). 

The following effect computations with 95% CI were under- 
taken: (i) OR contrasts; using linear predictions via the "pre- 
dict(xb)" option of the "margins" command and (ii) RR; as the 
ratio of the provider predictive margins divided by the grand 
weighted mean of the predictive margins, via nonlinear combina- 
tion of estimates (the Stata "nlcom" command [58]) and (iii) 
probability contrasts; the grand weighted mean of the predictive 
margins was subtracted from the predictive margin for each (ICU) 
provider. Adjustment of the comparison-wise error rate (individual 
ICU relative to the grand observation-weighted mean) was based 
upon the upper limit of the Bonferroni inequality, a e <ma c , where 
m is the comparison number; the adjusted error rate being 
oc c = !X e /m [28,59], where ol c is the comparison-wise error rate and 
a e is the experiment- wise error rate. 

Results 

The data set consisted of 145355 patient records in 128 ICUs, 
calendar years 2009 (47.5%) & 2010 (52.5%), with mean(SD) age 
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Figure 6. ICU site intercepts (95%CI) on OR and odds scale. FE ICU-site intercepts (95%CI) on OR scale (upper panel) and the ICU-site random 
effects (95%C1) on odds scale (lower panel) from the random coefficient model. Horizontal axis, mean site volume. 
doi:10.1371/journal.pone.0102297.g006 



60.9(18.8) years, APACHE III score 51.4(28.0) and ICU and 
hospital mortalities of 7.0% and 10.9% respectively. Fifty six 
percent were male and 38% were ventilated in the first 24 hours. 
The mean annual patient volume was 758(404); median 623, 
range 168—1701. The largest percentage of patients (42.8%) were 
in tertiary hospitals and resided in the Australian state of New 
South Wales (31.4%). Patient demographics over calendar year by 
hospital level and patient surgical status is seen in Table 1 . More 
patients were admitted in 2010 (n = 76346) versus 2009 
(n = 69009), but the demographics over the 2 year period were 
relatively stable. 

The fixed effects model (190 parameters, including 127 separate 
ICU site parameter estimates) had a BIC =64057.9, an AUC 
= 0.90 and an H-L statistic = 18.3 (P = 0.22; grouping number 
= 40). The continuous variable "annual-volume" and the 
categorical variables "ICU-level" and geographical-location" were 
identified as collinear and removed from the dependent variable 
list. When "annual- volume" was parameterised as a decile 
categorical variable, there was a model AIC increment of 5 (all 
parameter P -values (n = 9) were >0.1) and the variable was not 
further considered. A global Wald test of the 6 trigonometric 
seasonality parameters was significant at P = 0.004. A random 
intercept model with the identical independent variable list 
(excluding the ICU site variable as a categorical variable) had a 



BIC =64457, an AUC =0.90 and an H-L statistic =41.6 
(p = 0.32; grouping number = 40). A random coefficient model 
(random intercept as ICU site, random coefficient (slope) as 
centred APACHE III score; unstructured covariance), including 
the variables "annual-volume" (continuous) and "ICU-level" and 
"geographical-location" (categorical) had a BIC = 64555.8, an 
AUC =0.90 and an H-L statistic =42.8 (£ = 0.28; grouping 
number = 40). Both RE models satisfied the assumption of 
normality of random effects estimates (see File SI). Graphical 
display of the binned residual plots of the three models is seen in 
Figure 1; in terms of residual percentage outside boundary lines, 
there was slight advantage for the FE model (3.33%) versus the 
random intercept (3.84%) and random coefficient (3.85%) models. 
Overall there was some statistical advantage of the fixed effects 
model, none the least in terms of computational speed: FE model, 
9 seconds; random intercept model, 1 .8 hours; random coefficient 
model, 1 1.8 hours (computed on a 64-bit PC using an 8-core Intel 
i7-3960X CPU, clock speed 3.30 GHz). Details of parameter 
estimates for all three models (fixed effects, random intercept and 
random coefficient) have been included in File SI. 

Using the FE model, plots of ICU SMRs and CI by hospital 
level for the two calendar years, 2009 and 2010, are seen in 
Figure 2. There was evidence for contraction of the CI spread 
across the years, more so in the private ICUs. Of interest, no ICU 
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Figure 7. Fixed effects ICU mortality OR (95%CI) by hospital level and calendar year. Plots of predictive ICU mortality OR (95%CI) versus 
mean (ICU) site volume, by hospital level (rural, metropolitan, tertiary and private) over calendar year (2009, 201 0) for the FE model. Grand mean null 
line =1. 
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was identified as an outlier with respect to the null ( = 1) in either 
year. Similarly, for the random intercept and coefficient models, 
no statistical outliers were identified (Figures 3 and 4, respectively). 
Box plots of SMR point estimates (left panel) and 95% CI span 
(right panel) by model and year are seen in Figure 5. Shrinkage of 
point estimates for all three models is seen, 2010 versus 2009, but 
no striking difference between models; the random coefficient 
model having the greater spread of point estimates. Confidence 
interval span width tended to increase, 2010 versus 2009, and all 
models displayed "extreme" span widths. A comparison of the 
(model-based) FE ICU-site intercepts and the ICU-site random 
effects from the random coefficient model is seen in Figure 6, 
demonstrating point-estimate shrinkage for the RE model. 

Predictive margins analysis (OR scale with Bonferroni control of 
multiple comparisons) of ICUs relative to the grand mean, by 
hospital level for the calendar years 2009 and 2010 is seen in 
Figure 7. The spread of the ICU OR estimates relative to the 
grand mean (y-line = 1) was seen to decrease over years 2009 to 
2010, more evident for tertiary and private ICUs. As an 
illustration of the versatility of the margins command, we include 
two further graphics. Figure 8 shows risk ratio estimates relative to 
the grand mean, by hospital level for the calendar years 2009 and 
2010, displaying similar characteristics with respect to the over- 
year spread of estimates as in Figure 7; and Figure 9, which 



demonstrates on the probability scale, by hospital level, formal 
over-year contrasts (calendar year 2010 versus 2009) of the 
predictive margins with respect to the grand mean, with 
Bonferroni control of multiple comparisons. A majority of ICUs 
in the rural / regional, metropolitan and tertiary levels demon- 
strated a decrease in predicted probability over the 2 calendar 
years, but in the private sector, this trend was reversed. 

Discussion 

Using a fixed-effects logistic model to generate provider 
mortality probabilities in a large data-base over a two year period 
we were unable to demonstrate (i) substantive advantage for a 
conventional random effects approach and (ii) oudier status for any 
of the ICUs. These findings deserve further comment. 

Multiple studies have compared fixed and random effects 
estimators in assessing provider performance, the key performance 
indicator usually being the (log)-SMR [5,17,18,60-62], although 
standardisation [63] has not been undertaken in some studies and 
the user-specific (log)-OR has been advocated [64] and utilised in 
provider comparison [10,20,65]. The calculation of the SMR in 
the current context is equivalent to indirect standardisation, direct 
standardisation being "practically impossible when multiple 
predictors are included in the case-mix adjustment model" [66], 
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Figure 8. Fixed effects ICU mortality RR (95%CI) by hospital level and calendar year. Plots of predictive ICU mortality risk ratio (RR, 95%CI) 
versus mean (ICU) site volume, by hospital level (rural, metropolitan, tertiary and private) over calendar year (2009, 2010) for the FE model. Grand 
mean null line = 1 . 
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and the former method may not sufficiently adjust for case-mix 
difference / confounding [15,67]. This being said, formal model 
development, in terms of appropriate covariates [12] and model 
assessment has been quite variable in the literature: in particular, 
the lack of adjustment for mechanical ventilation status, patient 
transfer, diagnostic categories and seasonality; and little extension 
beyond reporting of conventional AUC and H-L statistic [68] in 
terms of model performance. Under the FE model and in the 
presence of large numbers of providers, for instance >4000 as 
reported by Ash and co-workers [15], there may be concerns 
regarding estimator consistency, but such concerns would appear 
to be more apparent than real [69,70] . Unlike other studies using a 
FE approach, we accounted for within-ICU patient correlation 
using the cluster-variance option of Stata to obtain unbiased 
variance estimators ([71], see Statistical analysis, above). 

A number of papers have suggested that " non-hierarchical" 
estimators increase the possibility of false outlier classification 
[10, 1 1,67], and the shrinkage of RE estimators has been accepted 
as a virtue in that it would result in a ". . .more accurate estimate of 
a provider's unobserved true performance ..." [11], although there 
has been disquiet at the very consequences of this feature [13,14], 
In the current study, as shown in Figure 3, we were unable to 
demonstrate this reported characteristic of RE models compared 
with FE, although shrinkage of the point estimates of the RE 



intercepts compared with the FE ICU-site intercepts was quite 
evident (Figure 6), albeit with wide 95%CI (on the odds scale). 
Apropos this point of contention, a recent combined simulation 
and empirical study has reported the FE estimator to provide 
"...high power to identify providers with exceptional outcomes or 
to estimate the magnitude of the difference from expected for such 
exceptional providers..." [16]. One aspect of the current study 
that may inform the lack of demonstration of mortality outliers on 
the SMR scale was the database that we utilized; a binational 
Intensive Care data base which was different from that of, say, the 
COPSS-CMS authors [15] and other papers where specific 
medical diagnoses in a variety of general hospitals with quite 
variable (and small) cluster size were addressed. Minimum 
annualised ICU volume was modest at 168 patients (equivalent 
to 3 ICU admissions per week); that is, there were no extreme 
oudiers with respect to ICU "cluster" size, although the number of 
clusters was adequate [72]. We have previously drawn attention to 
the implications of these particular (Australia and New-Zealand) 
intra-hospital ICU characteristics when addressing the volume- 
outcome question [30]. The recent findings of Madigan et al, that 
"clinical studies that use observational databases can be sensitive 
to the choice of database" gives credence to such cautions [73]. 

The current study would appear to be one of the first to assess 
provider performance exploiting predictive margins, the use of the 
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Figure 9. ICU mortality probability contrasts by hospital level and calendar year. Plots of predictive ICU mortality probability contrasts 
(calendar year 2010 versus 2009) by hospital level (rural, metropolitan, tertiary and private). Bonferroni control of multiple comparisons (see 
"Statistical analysis", above). 
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which has distinct advantages: the ability to resolve problems 
inherent in prediction across categorical predictors (for example, 
the prediction of the "average" gender effect [74]), that is the 
average effect versus the effect at the average covariate value, or 
AME versus MEM (see the discussion of the AME in Statistical 
analysis, above); seamless computation of ICU-effects with respect 
to the grand mean on both the OR and RR scale; the ability to 
formally estimate year-to-year changes in (predicted) mortality as 
precisely displayed in Figure 9 where over-year changes are seen in 
tertiary (decrease) and private (increase) ICUs; the ability to define 
general risk levels in ICU strata, this being evident in the general 
increase in RR for tertiary ICUs (Figure 8), a finding consistent 
with our previous observations for ventilated patients in this 
database [30]; and the computationally simple use of adjustment 
for multiple comparisons within a modelling framework. An 
extension of the latter (not presented in the current study) would 
be the vexed problem of specific between-provider comparisons in 
so-called caterpillar plots [75-77]; within the margins framework 
this may be easily accomplished using pairwise comparisons (the 
Stata "pwcompare" module, with say, Bonferroni adjustment 
[78]). Inference from these model based estimates is thus of some 
value in assessing provider performance and may be contrasted 



with, and are orthogonal to (see Figure 10), that provided by the 
SMR, the statistical properties of which (for example, variance 
estimation), being non-model based, are somewhat problematic 
[12,79]. The SMR is a dimensionless measure of provider 
outcome and is therefore valid for direct comparisons across 
providers. Indeed, the SMR may be regarded as the 'canonical 
residual provider effect' and should be uncorrelated with the 
model-based estimates. 

The statistical advantages of the FE approach compared with 
the RE were quite modest, although computational speed and 
simplicity recommended the former. As noted in Results, above, 
ICU level (and geographical region) were unable to be explicidy 
fitted in the FE model due to confounding / collinearity, but were 
"recovered" within the model based predictive margins analysis. 
Such confounding does not arise with RE modelling and FE and 
RE modelling approaches might be best characterised as 
complementary, rather than comparative. 

Conventional one-stage RE (and FE) estimation considers both 
"usually" and "unusually" performing providers, leading to 
inflated random effect variance estimates and the inability to 
properly account for the latter provider-type ("unusual") in 
estimation. A staged approach to estimation which includes a 
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Figure 10. Biplot of multivariate estimate relationships. Biplot demonstrating multivariate relationships between variables (fixed effects SMR 
(SMR_FE), predictive margins probability (Margins_probability), predictive margins odds ratio (Marginns_OR0 and predictive margins risk ratio 
(Margins_RR)) represented by arrow-headed lines; observations are represented by "dots" (see "Statistical analysis", above). 
doi:10.1371/journal.pone.0102297.g010 



"null" model describing the behaviour of "usual" providers would 
appear to be apposite [12]; such an approach may also 
accommodate over-time analysis [80]. Similarly, the simplistic 
claim that the shrinkage process of RE estimators mitigate against 
multiple comparisons [67], elides the real problems of false 
discovery rate and regression to the mean, both of which must be 
formally handled within a RE scenario [12]. As with the FE 
estimator, specific requirements of the RE estimator are rarely 
tested; the distribution of the RE, as reflected in the gradient 
function [81], and (for the intercept only RE model) lack of 
correlation between random intercepts and patient case-mix 
[15,82]. In the presence of such a correlation, which it is plausible 
to think may commonly occur, the performance of the RE 
estimator is "...adversely affected..." [16]. 

The developed FE model had advantage compared with the 
conventional RE models and disclosed no ICU performance 
outliers in calendar years 2009-2010. Current developments in 
RE estimation, which embrace a "null" model and adjust for the 
false discovery rate and regression to the mean, are superior to a 
single application of a RE (or FE) model, but are considerably 
more complex statistically and computationally. Analysis using 
predictive margins allows substantial inferential insight into 
provider performance. 



Supporting Information 

File SI This file contains supporting information 
including Table SI -Table S3, Figure SI, and Figure S2. 

Table SI, Model estimates: fixed effects. Table S2, Model 
estimates: random intercept. Table S3, Model estimates: random 
coefficient. Figure S 1 , Standardized normal probability plots (P— P 
plot) of the random effects; random intercept model. Figure S2, 
Standardized normal probability plots (P-P plot) of the random 
effects; random coefficient model, a) Random effects for ICU site: 
APACHE III score, b) Random effects for ICU site: constant. 
(DOCX) 
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